Mining official data

نویسندگان

  • Paula Brito
  • Donato Malerba
چکیده

In statistics, the term “official data” denotes data collected in censuses and statistical surveys by National Statistics Institutes (NSIs), as well as administrative and registration records collected by government departments and local authorities. They are used to produce “official statistics” for the purpose of making policy decisions, and to facilitate the appreciation of economic, social, demographic, and other matters of interest to governments, government departments, local authorities, businesses and to the general public. For instance, population and economic census information is of great value in planning public services (education, fund allocation, public transport), as well as in private businesses (placing new factories, shopping malls, or banks, as well as marketing particular products). Moreover, survey data on specific topics, such as labour force, time use, household budget, are regularly collected by NSIs to keep updated information on some economic and social phenomena. The application of data mining techniques to official data has great potential in supporting good public policy and in underpinning the effective functioning of a democratic society. Nevertheless, it is not straightforward and requires challenging methodological research, which is still in the initial stages. This special issue includes six papers which constitute updated and extended versions of papers selected from those presented at the Workshop on Mining Official Data, chaired by the guest editors of this issue in Helsinki in August 2002. The workshop was organized under the auspices of the European project KDNet (The Knowledge Discovery Network of Excellence) and within the framework of the 13th European Conference on Machine Learning (ECML’02) and the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’02). Different directions can be distinguished in the approach of the problem of mining official data. In this issue, emphasis is placed on the following topics: Geo-referenciation. The practice of geo-referencing census data has increasingly spread over the last few decades and the techniques for attaching socio-economic data to specific locations have markedly improved at the same time. In the UK, for instance, household expenditure data are provided for each enumeration district (ED), the smallest areal unit for which census data are published. At the same time, vectorized boundaries of the 1991 census EDs enable the investigation of socio-economic phenomena in association with the geographical location of EDs. These advances cause a growing demand for more powerful data analysis techniques that can link population data to their spatial distribution. In this context, a European project, SPIN, has been developed to address problems concerning geo-referenciation. SPIN’s

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a Model for Predicting Tax Evasion of Guilds Based on Data Mining Technique

In this research, considering the importance of the topic and the gap in previous researches, a model for predicting tax evasion of guilds based on data mining technique is presented. The analyzed data includes the review of 5600 tax files of all trades with tax codes in Qazvin province during the years 2013-2018. The tax file related to guilds is in five tax groups, including the guild group o...

متن کامل

Data Mining and Official Statistics: The Past, the Present and the Future.

Along with the increasing availability of large databases under the purview of National Statistical Institutes, the application of data mining techniques to official statistics is now a hot topic that is far more important at present than it was ever before. Presented in this article is a thorough review of published work to date on the application of data mining in official statistics, and on ...

متن کامل

Important Issues on Statistical Confidentiality Methods

This paper sets out, in the context of official statistics, some of the key issues of confidentiality and the methods developed to maintain confidentiality. The relevance of the issues and methods to data mining of official data are discussed. Recent developments that will increase the availability of microdata for scientific research are outlined.

متن کامل

Application of Open Data for Official Statistics, Case Study Data of Instagram Social Network

Abstract. Open data notion is based on the idea that emphasizes on free access of users to data to reuse them on their own and republish the result far from some restrictions of copyright, patent etc.  Due to the ever increasing trend of Information and Communication Technology (ICT), more data is producing every day and this brings brilliant opportunity for National Statistical Offices (NSOs) ...

متن کامل

Symbolic representation based on trend features for biomedical data classification.

BACKGROUND The widespread access to portable medical devices or new personal devices is boosting the amount of biomedical data. These devices provide a growing massive data that far exceeds the analytical ability of a professional doctor. The computer-assisted analysis of biomedical data has become an essential tool in medicine diagnosis. OBJECTIVE Due to the advantages of discrete, noise eli...

متن کامل

International Workshop on Current Challenges in Kernel Methods ( CCKM 06 )

The official 2006 kernel workshop " 10 years of kernel machines " Koninklijke Vlaamse Academie van België Sponsors De Wetenschappelijke Onderzoeksgemeenschap (WOG) " Machine Learning for Data Mining and its Applications " De Koninklijke Vlaamse Academie van Belgie (KVAB) The PASCAL network of excellence

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003